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@ Steering of instructions in a computer system. 
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@ A computing system includes a main memory 

(11) , an instruction cachie (13) and a processor 

(12) . The processor (12) includes memory inter- 
face means (26), predecoding means 
(4041,43,44), interface means (24), a first 
arithmetic logic unit (36), a second aritiimetic 
logic unit (37) and steering means (34,45). The 
memory Interface means (26) is connected to 
the main memory (11) and fetches instructions 
from the main memory (11). The predecoding 
means (40.41,43.44) is connected to the mem- 
ory interface means (26) and predecodes the 
instructions to generate predecode bits. The 
predecode bits indicate whether and how the 
instructions may be bundled. The interface 
means (24) is connected to the predecoding 
means (40,41,43,44) and the instruction cache 

(13) . The interface means (24) stores the in- 
structions and the predecode bits in the instruc- 
tion cache (13) and fetches the instructions 
from the instruction cache (13) with the pre- 
decode bits. The steering means (34,45) is con- 
nected to the interface means (24), the first 
arithmetic logic unit (36) and the second 
arithmetic logic unit (37). The steering means 
(34,45) steers each of the instructions to one of 
the firsV integer arithmetic logic unit (36) and 
the second integer arithmetic logic unit (37) for 
execution, the steering means (34,45) utilizing 
the predecode bits to steer the instructions. 
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The present invention concerns predecoding and steering instructions executed in a computer system for 
example a superscalar processor. 

Most modern computer systems include a central processing unit (CPU) and a main memory. The speed 
at which the CPU can decode and execute instructions and operands depends upon the rate at which the in- 
5 structions and operands can be transferred from main memory to the CPU. In an attempt to reduce the tinne 
required for the CPU to obtain Instructions and operands from main memory, many computer systems include 
a cache memory between the CPU and main memory. 

A cache memory is a small, high-speed buffer memory which is used to hold temporarily those portions 
of the contents of main memory which it is believed will be used in the near future by the CPU. The main pur- 
10 pose of a cache memory is to shorten the time necessary to perform memory accesses, either for data or in- 
struction fetch. The information located in cache memory may be accessed in much less time than information 
located in main menrK)ry. Thus, a CPU with a cache memory needs to spend far less time waiting for instructions 
and operands to be fetched and/or stored. 

A cache memory is made up of many blocks of one or more words of data. Each block has associated with 
15 it an address tag that uniquely identifies which block of main memory it is a copy of. Each time the processor 
makes a memory reference, an address tag comparison is made to see if a copy of the requested data resides 
in the cache memory. If the desired memory block is not in the cache memory, the block is retrieved from the 
main memory, stored in the cache memory and supplied to the processor. A cache memory used to store in- 
structions is generally referred to as an instruction cache. A program counter is used to determine which in- 
20 structions are to be fetched for execution. 

In some computer systems, parallel execution of instructions (called "bundling" of instructtons) may be util- 
ized to speed up computer operation. Processors which provide for parallel execution of instructions can be 
referred to as superscalar processors. Superscalar computers generally utilize more than one execution unit 
to provide for bundling of instructions. An execution unit is, for example an arithmetic logic unit (ALU) or a float- 
25 ing point unit (FPU). 

Even with multiple execution units, there are still limitations to which instructions may be bundled. For ex- 
ample, some instruction may have conflicts with other instructions. The type of conflict can take various forms. 
A resource conflict occurs when two instructions both use the same, limited processor resource. This may oc- 
cur, for example, when both instructions require use of the same execution unit Alternately, data dependency 

30 may result in a conflict. That is, when one instruction produces a result to be used by a next instruction, the 
two instructions cannot be bundled. Also, a procedural dependency may result in a conflict For example, an 
instruction which follows a branch instruction cannot be bundled with the branch instruction, since execution 
of the instruction depends on whether the branch is taken. In order to determine whether two or more given 
instructions can be bundled, it is generally necessary to first decode the instructions. This may be done, for 

35 example by an instruction decode unit 

Various methods have been advanced for minimizing the performance penalty for decoding and steering 
instructions to the proper execution unit. For example, compiler techniques may be used to assist the instruc- 
tion decode unit to determine whether two or more instructions can be bundled. That is, during compile time, 
the compiler can encode one or more bits rn the actual instruction operational code (op-code) to be utilized 

40 by the Instruction decode/steering hardware. These bits can provide information to the decode hardware as 
to how the instruction may be bundled with other instructions. The predecode information, in effect, is em- 
ployed as part of the instruction set architecture. However, the information needed by the decode hardware 
is processor dependent; therefore, such an encoding of bits can limit the flexibility of different processors to 
optimally execute opcode without a code recompile. 

45 In one system, a dedicated predecoded bit Is stored in the instruction cache which is used by decode hard- 

ware to steer Instructions to either an integer arithmetic logic unit (ALU) or a floating point unit (FPU). See, for 
example, E. DeLano, W. Walker, J. Yetter. M. Forsyth. "A High Speed Superscalar PA-RISC Processor", IEEE. 
1992. pp. 116-121. 

The present invention seeks to provide steering of instructions in a computer system. 
50 According to an aspect of the present invention there is provided a method of steering instructions as spe- 

cified in claim 1. ^ * 

According to another aspect of the present invention there is provided a computer system as specified in 
claim 6. 

In accordance with the preferred embodiment of the present invention, a computer system includes a main 
5S fT^enr^cry an InstruCtlori CuChe snd 3 processor The proceSwOr Includew memory interfece means predecoding 
means, interface means, a first arithmetic logic unit, a second arithmetic logic unit and steering means. The 
memory interface means is connected to the main menrary and fetches instructions from the main memory. 
In the preferred embodiment, the memory interface means fetches the instructions from the main memory two 

2 
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at a time in a double word. 

Preferably, the predecoding means Is connected to the memory interface means and predecodes the in- 
structions to generate predecode bits. The predecode bits indicate whether and how the instructions may t>e 
bundled. In the preferred embodiment, the predecode bits Identify, for each pair of bundled Instructions, to 
5 which of the first integer arithmetic logic unit and the second integer arithmetic logic unit a particular instruction 
is to be steered. The predecoding means may include three predecode registers. A first predecode register 
holds an even word instruction of an instruction pair currently being decoded. A second predecode register 
holds an odd word Instruction of the Instruction pair currently being decoded. A third predecode register holds 
an odd word instruction of an Instruction pair previously predecoded. 
10 The Interface means Is preferably connected to the predecoding means and the instruction cache. The 

interface means stores the instructions and the predecode bits in the instruction cache and fetches the in- 
structions from the Instruction cache with the predecode bits. The steering means is connected to the interface 
means, the first arithmetic logic unit and the second arithmetic logic unit. The steering means steers each of 
the instructions to one of the first integer arithmetic logic unit and the second integer arithmetic logic unit for 
15 execution. The steering means utilizes the predecode bits to steer the instructions. In the preferred embodi- 
ment, the steering means includes a state machine. A current state of the state machine determines which of 
the predecode bits the steering means utilizes to steer the instructions. 

In the preferred embodiment of the present invention, the processor also Includes a floating point unit con- 
nected to the steering means. The steering means steers floating point instructions to the floating point proc- 
20 essor. Also in the preferred embodiment the predecode bits generated by the predecoding means Indicate 
whether two consecutive instructions may be bundled forexecution. Additionally, the predecode bits generated 
by the predecoding means indicate whether two consecutive Instructions which may be bundled for execution 
are non-aligned or aligned. 

The preferred embodiment of the present Invention can provide efficient bundling and steering of instruc- 
25 tions in a superscalar processor 

An embodiment of the present invention Is described below, by way of example only, with reference to the 
accompanying drawings, in which: 

Figure 1 shows a simplified block diagram of a computer system with a instruction cache and a data cache, 
in accordance with a preferred embodiment of the present Invention. 
30 Figure 2 shows a simplified block diagram of a processor shown in Figure 1 , in accordance with a preferred 

embodiment of the present invention. 

Figure 3 shows a simplified block diagram of the logical blocks pertaining to predecoding and steering of 
instructions within the processor shown in Figure 1. in accordance with a preferred emtxjdiment of the present 
invention. 

35 Figure 4 is a state diagram for a state machine shown In Figure 3 in accordance with the preferred em- 

bodiment of the present Invention. 

Figure 1 shows a simplified block diagram of a computer system. A processor 12 and a memory 11 are 
shown connected to a bus 10. Processor 12 utilizes a Instruction cache 13 and a data cache 14. Instruction 
cache 13 stores Instructions for processor 12 In static random access memory (SRAM). Data cache 14 stores 

40 data for processor 12 in SRAM. 

Figure 2 shows a simplified block diagram of processor 12. Processor 12 Is shown to include system bus 
Interface logic 26. instruction cache Interface logic 24. data cache interface logic 25. an arithnnetic logic unit 
(ALU) 22, a translation look aside buffer (TLB) 21. and an assist cache 23. System bus interface logic 26 pro- 
vides processor 12 with an Interface to system bus 10. Instruction cache Interface logic 24 provides processor 

45 12 with an Interface to instruction cache 13. Data cache interface logic 25 provides processor 12 with an In- 
terface to data cache 14. Assist cache 23 is used in parallel with data cache 14 to provide data to arithmetic 
logic unit 22. Translation look aside buffer 21 is used to map virtual addresses to real addresses In order to 
generate cache tags to be used to access data stored within assist cache 23 and within data cache 14. 

Figure 3 is a simplified block diagram of the logical blocks pertaining to predecoding and steering instruc- 

50 tions within processor 12. In the preferred embodiment of the present invention, system bus Interface 26 im- 
plements 64-tHt wide dout)<e-word transfers t>etweenTnemory 11 and processor 12. Each double-word <XMYteRns 
two 32-bit instructions. The Instruction in a double word which occupies the high order bits (bits [0:31]) of the 
double word is referred to as the even word instruction. The instruction In a double word which occupies the 
tow order bits (bits [32:63]) of the double word is referred to as the odd word instruction. 

55 When words are retrieved from memory 11 and forwarded to instruction cache along data path 54. pre- 

decode logic 44 generates predecode bits, placed on a data path 55. to be stored with each double word. The 
nature and function of the predecode bits are further described below. 

Predecode logic 44 generates the predecode bits based on Information in the double word. When a double 
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word is fetched from memory, the even word instruction, through a data path 60. is placed in an even word 
instruction register 40. The odd word instruction, through a data path 61 . is placed in an odd word instruction 
register 41 . When a next double word is fetched from memory, the new even word instruction is placed in even 
word instruction register 40. The new odd word instruction is placed in odd word instruction register 41. The 

5 odd word instruction formerly in odd word instruction register 41 is moved to an odd word instruction register 
43. As will be further described below, predecode logic 44, on the basis of the instructions in even word in- 
struction register 40, odd word instruction register 41 and odd word instruction register 43 generates the pre- 
decode bits placed on data path 55. 

Instruction cache interface 24 stores double words received on data path 54 together with predecode bits 

10 on data path 55 into instruction cache 13. Address tines 51 are used to address memory locations in instruction 
cache 13. A sixty-four bit wide data path 52 is used to transfer double-word instructions between processor 
12 and instruction cache 13. Predecode bits stored with a double word are transferred simultaneously with 
the double word between instruction cache 13 and processor 12 along a data path 53. Instruction cache 13 
stores the predecode bits along with the associated double word. 

15 When a double word is retrieved by processor 12 from instruction cache 13 for execution of instructions 

within the double word, the even word instruction is placed in a received even word instruction register 30, 
and the odd word instruction is placed in a received odd word instruction register 31. When a next double word 
is retrieved by processor 12 from instruction cache 13. the new even word instruction is placed in received 
even word instruction register 30. and the new odd word instruction is placed in received odd word instruction 

20 register 31. The even word instruction formerly in received even word instruction register 30 is moved to a 
saved even word instruction register 32. The odd word instruction formerly in received odd word Instruction 
register 31 is moved to a saved odd word instruction register 33. 

Steering logic 34 forwards instructions in received even word instruction register 30. saved even word In- 
struction register 32, received odd word instruction register 31 and saved odd word instruction register 33 to 

25 either an arithmetic logic unit (ALU) 36, an ALU 37 or a floating point unit (FPU) 35 for execution. Steering 
logic 34 makes the decision based on predecoded bits received on data path 56 as well as state information 
received from a dual state machine 45. In the preferred embodiment, steering logic 34 also looks at a single 
bit from saved odd register 33 to see whether this is a floating point instruction or noL 

In the preferred embodiment of the present invention there are six categories of instructions. The first cat- 

30 egory is load/store (tdst) instructions. Execution of Idst Instructions results in information being loaded from 
or stored to memory/cache. This first category includes, for example, instructions which load or store integers 
as well as floating point numbers. 

The second category Is arithmetic/logic (alu) instructions. The second category includes, for example, in- 
structions which perform an add, subtract, and logic "OR", and a logic "AND". 

35 The third category is mask/merge/shift (mms) instructbns. The third category includes, for example, in- 

structions which deposit, extract, and shift data within one or more registers. 

The fourth category is floating point (flop) instructions. The fourth category includes, for example. Instruc- 
tions which add, multiply, divide and perform square roots on floating point numbers. 

The fifth category Is branch (br) instructions. The fifth category includes, for example, instructions which 

40 compare and branch, add and branch, and branch and link. 

The sixth category is system (sys) instructions. The sixth category includes, for example, instructions 
which insert TLB values, flush the data cache, move to/from control registers, move to/from space registers. 

In the preferred embodiment of the present invention. FPU 35. ALU 36 and ALU 37 each execute only in- 
structions In certain categories. Specifically. FPU 35 executes only instructions in the fourth category (flop 

45 instructions). ALU 36 executes instructions in the second category (alu Instructions), in the third category 
(mms instructions) and in the fifth category(br instructions). ALU 37 executes instructions in the first category 
(Idst Instructions) and in the second category (alu instructions). Instructions in the sixth category (sys Instruc- 
tions) require both ALU 36 and ALU 37 to execute them. 

In the preferred embodiment of the present invention, for every double word of two instructions, predecode 

50 logic 44 generates six predecode bits. The predecode bits indicate alignment and bundling of instructions. 
' When aligned instructions are bundled, this Tneans that the instruction in the even word of the current double ■ 
word is to be executed simultaneously with the instruction In the odd word of the current double word. When 
non-aligned instructions are bundled, this means that the instruction in the even word of the current double 
word is to be executed simultaneously with the instruction In the odd word of the previous double word. 

-5-5 The first (bit 0) predecode bit (EFLOP), when set. Indicates that the even word instruction Is a floating 

point operation for an aligned double word. The second (bit 1) predecode bit (AL02), when set indicates that 
the double word aligned two instructions are bundled and the odd word instruction Is steered to 37. The third 
(bit 2) predecode bit (AL01), when set indicates that the double word aligned two instructions are bundled and 

4 
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the odd word instruction is steered to ALU 36. The fourth (bit 3) predecode bit (NLE2), when set, indicates that 
the double word non-aligned two instructions are bundled and the even word instruction is steered to ALU 37. 
The fifth (bit 4) predecode bit (NLE1), when set. indicates that the double word non-aligned two instructions 
are bundled and the even word instruction is steered to ALU 36. The sixth (bit 5) predecode bit (ALDUAL). 

5 when set, indicates that the double word aligned two instructions are bundled. 

Encoding of the predecode bits is performed by predecode logic 44 as follows. When a double word is 
fetched fronn memory, the even word instruction is placed in even word instruction register 40. The odd word 
instruction is placed in odd word instruction register 41. Within a single instruction cycle, predecode logic 44 
generates predecode bits which apply to the aligned double word consisting of the even word instruction placed 

10 in even word instruction register 40 and the odd word instruction placed in odd word instruction register 41. 
The generated predecode bits also apply to the non-aligned double word consisting of the odd word Instruction 
placed in odd word instruction register 43 and the even word instruction placed in even word instruction register 
40. The generated predecode bits are forwarded to instruction cache interface 24 to be stored in instruction 
cache 1 3 with the double word originally fetched from memory. 

15 Predecode logic 44 sets EFLOP bit when the even word instruction placed in even word instruction register 

40 Is a floating point instruction. 

Predecode logic 44 sets AL02 bit when the odd word instruction placed in odd word instruction register 

41 Is a load/store Instruction or an alu operation Instruction. However, for the bit to be set, there can be no 
dependencies between the even word instruction placed in even word Instruction register 40 and the odd word 

20 instruction placed In odd word instruction register 41 . There are three dependencies which prevent the setting 
of AL02 bit. The first dependency is a register set/use dependency which occurs, for example, when the even 
word Instruction in even word register 40 sets a particular register and if the odd word instruction in odd word 
register 41 uses the register. The second dependency is a carry/barrow set/use dependency which occurs, 
for example, when the even word instruction in even word register 40 sets a carry/barrow bit and the odd word 

25 instruction in odd vwDrd register 41 uses the carry/barrow biL The third dependency is a branch/system de- 
pendency which occurs, for example, when the even word Instruction in even word register 40 is a branch or 
a system instruction. An instruction following a branch cannot be bundled with the branch instruction. Nothing 
can be bundled with a system instruction. 

Predecode logic 44 sets AL01 bit when the odd word instruction placed in odd word instruction register 

30 41 Is a mms instruction, a branch Instruction or an alu operation instruction. However, for the bit to be set, 
there can be no dependencies between the even word instruction placed in even word instruction register 40 
and the odd word instruction placed in odd word instruction register 41. 

Predecode logic 44 sets NLE2 bit when the even word instruction placed in even word instruction register 
40 is a load/store instruction or an alu operation instruction. However, for the bit to be set, there can be no 

35 dependencies between the odd word instruction placed in odd word instruction register 43 and the even word 
instruction placed in even word instruction register 40. 

Predecode logic 44 sets NLE1 bit when the even word instruction placed in even word instruction register 
40 is a mms instruction, a branch Instruction or an alu operation instruction. However, for the bit to be set, 
there can be no dependencies between the odd word instruction placed in odd word instruction register 43 

40 and the even word instruction placed in even word instruction register 40. 

Predecode logic 44 sets ALDUAL bit when the even word instruction placed in even word instruction reg- 
ister 40 may be bundled with the odd word instruction placed in odd word instruction register 41 . However, for 
the bit to be set, there can be no dependencies between the even word instruction placed in even word in- 
struction register 40 and the odd word instruction placed in odd word instruction register 41. The ALDUAL bit 

45 is not used for steering. 

Figure 4 shows a state diagram for dual state machine 45. Whenever there is a branch to an even word 
instruction, dual state machine 45 enters a state 1 01 . As long as instructions from double words retrieved from 
instruction cache 13 are bundled, dual state machine stays in state 101. When two instructions in a double 
word are not bundled, dual state machine 43 enters a state 102. As long as non-aligned Instructions retrieved 

50 from instruction cache 13 are bundled, dual state machine stays in state 102. When non-aligned instructions 
^ - am^rret bundled, dual state machine 43 enterals state- 103: If the next aligned instructions from double words 
retrieved from saved even word instruction register 32 and saved odd word instruction register 33 are bundled, 
dual state machine enters state 101. When, in state 103, next aligned instructions from saved even word in- 
struction register 32 and saved odd word instruction register 33 are not bundled, dual state machine 43 enters 

55 state 102. 

Whenever there is a branch to an odd word instruction, dual state machine 45 enters a state 104. In state 
1 04, instructions may not be bundled. After execution of the odd word instruction, dual state machine 45 enters 
state 101. 

5 
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Steering logic 34 steers instructions in received even word instruction register 30. saved even word in- 
struction register 32. received odd word instruction register 31 and saved odd word Instruction register 33 to 
either ALU 36. ALU 37 or FPU 35 based on predecoded bits received on data path 56 as well as state infor- 
mation received from a dual state machine 45. In the preferred embodiment, steering logic 34 also looks at a 
5 Single bit from saved odd register 33 to see whether this is a floating point instruction or not. 

Steering Table 1 shows which instructions are executed by which of ALU 36, ALU 37 or FPU 35 when dual 
state machine 45 is in state 101 or state 103 and the aligned instructions are bundled. 
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Instr. Register Predecoded Bits 
Even 30 Odd 31 FFLOP ALOl 



Steering Table I 

Execution Unit 
AL02 NTEl Nn.F9 



ALU 36 AT,U37 FPU 3S 



15 



20 



25 



30 



e_alu 

eaJu 

e_alu 

e_alu 

ealu 

eJdst 

e_ldst 

ejd St 

eJdst 

eflop 

e_flop 

eflop 

eflop 

enims 

e_mms 

e mms 



o_alu 

o_mjns 

o_br 

o_ldst 

oflop 

o_aIu 

o^mms 

obr 

o_flop 

o_alu 

o_mms 

o_br 

ojdst 

oalu 

oJdst 

o_flop 



0 

0 

0 

0 

0 

0 

0 

0 

0 

1 

1 

I 

1 

0 
0 



I 
1 
1 

0 
0 

1 
1 
1 

0 

1 
1 
1 

0 
0 
0 
0 



0 
0 
0 

1 

0 
0 
0 
0 
0 
0 
0 
0 

1 
1 
1 

0 



X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 

X 



X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 
X 



o_alu 

o_mms 

o_br 

e_alu 

e_alu 

o_alu 

omms 

o_br 

X 

o_alu 
o_mms 
o br 
X 

e_nims 
e_mms 
e mms 



e_alu 

ealu 

e_alu 

oJdst 

X 

eJdst 

eJdst 

eJdst 

eJdst 

X 

X 

X 

oJdst 
o_alu 
oJdst 
X 



X 
X 
X 
X 

o_flop 
X 
X 
X 

o_flop 
e_flop 
e_flop 
e_flop 
e_flop 
X 
X 

o_fIop 



The first column of Steering Table 1 above shows the type of instruction that is in even word instruction 

35 register 30. The 'V listed before the type of instruction indicates that it is the even instruction in a double word. 
The second column of Steering Table 1 above shows the type of instruction that is in odd word instruction reg- 
ister 31. The "o" listed before the type of instruction indicates that it is the odd instruction in a double word. 
The third column of Steering Table 1 shows the value of predecoded bit EFLOP for the double word stored in 
even word instruction register 30 and odd word instruction register 31. A "0" in the third column indicates pre- 

40 decode bit EFLOP is cleared. A "1 " in the third column indicates predecode bit EFLOP is set. The fourth column 
of Steering Table 1 shows the value of predecoded bit AL01 for the double word stored in even word instruction 
register 30 and odd word instruction register 31. The fifth column of Steering Table 1 shows the value of pre- 
decoded bit AL02 for the double word stored in even word instruction register 30 and odd word instruction reg- 
ister 31. The sixth column of Steering Table 1 shows the value of predecoded bit NLE 1 for the double word 

45 stored in even word instruction register 30 and odd word instruction register 31. The "X" values in the sixth 
column indicates that it does not matter whether bit NLE1 bit is cleared or seL The seventh column of Steering 
Table 1 shows the value of predecoded bit NLE2 for the double word stored in even word instruction register 
30 and odd word instruction register 31 . The eighth column of Steering Table 1 shows the instruction from col- 
umn 1 or column 2 that is to be steered to ALU 36. An "X" value in the eighth column indicates that it does not 

50 matter Which instruction is steered to ALU 36. The ninth column of Steering Table 1 shows the instruction from 
column 1 or cciumn 2 that is to b» steered to ALU 37. An "X" value in the eighth column indicates that it does-, 
not matter which instruction is steered to ALU 37. The tenth column of Steering Table 1 shows the instruction 
from column 1 or column 2 that is to be steered to FPU 35. An "X" value in the eighth column indicates that it 
does not matter which instruction is steered to FPU 35. 

55 Sfroorinn TaKl^rk O cV^^*M,r- ...u:«u .• ... - . . . . — ... - 

a - ^..w**^ TTiiiv.il iiioiiuuuuiia <aic cAcsuumu Dy wnicn OT ALU ALU '67 Or FPU 35 When dual 

state machine 45 is in state 102 and the non-aligned instructions are bundled. 
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Steering Table 2 

Instr. Register Predecoded Bits Execution Unit 

Odd 22 Even 30 EFLQP ALQl AL02 NLEl NLE2 ALU 3 6 ALU 37 FPU 35 
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o_alu 


e_br 
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e„br 


o_alu 
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o_alu 


e flop 
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X 
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o_alu 
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e flop 


o alu 


e_ldst 


0 


X 


X 


0 


1 


o_alu 


e Idst 
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ojdst 


e„alu 


0 


X 


X 
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e_alu 


ojdst 


X 


ojdst 


e.mms 


0 


X 


X 
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e_mms 


0 Idst 


X 


ojdst 




0 


X 


X 
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0 


e br 


ojdst 


X 


o.ldst 


e_flop 


1 


X 


X 


0 


0 


X 


0 Idst 


e_nop 


o.flop 


e_alu 


0 


X 


X 


1 


0 


e_alu 


X 


o_flop 


o_flop 


e_inms 


0 


X 


X 


1 


0 


e_mms 


X 


o_flop 


o_flop 


e_br 


0 


X 


X 


1 


0 


e br 


X 


o_flop 


o.flop 


e.Idst 


0 


X 


X 


0 


1 


X 


e.ldst 


0 flop 


o_mms 


e_alu 


0 


X 


X 


0 


1 


o_miiis 


e_alu 


X 


o_mms 


e.ldst 


0 


X 


X 


0 


1 


o_mnis 


e Idst 


X 


o_mms 


e_flop 


1 


X 


X 


0 


0 


o_mms 


X 


e_nop 



The first column of Steering Table 2 above shows the type of Instruction that Is in even word Instruction register 
33. The "o" listed before the type of instruction Indicates that It is the odd Instruction in a double word. The 
second column of Steering Table 2 above shows the type of Instruction that is in even word instruction register 

25 30. The "e" listed before the type of Instruction indicates that it is the odd instruction in a double word. The 
third column of Steering Table 2 shows the value of predecoded bit EFLOP for the double word stored In even 
word Instruction register 30 and odd word instruction register 33. The fourth column of Steering Table 2 shows 
the value of predecoded bit AL01 for the double word stored In even word Instruction register 30 and odd word 
instruction register 33. The fifth column of Steering Table 2 shows the value of predecoded bit AL02 for the 

30 double word stored in even word Instruction register 30 and odd word instruction register 33. The sixth column 
of Steering Table 2 shows the value of predecoded bit NLE 1 for the double word stored in even word instruction 
register 30 and odd word instruction register 33. The seventh column of Steering Table 2 shows the value of 
predecoded bit NLE2 for the double word stored in even word instruction register 30 and odd word instruction 
register 33. The eighth column of Steering Table 2 shows the instruction from column 1 or column 2 that is to 

35 be steered to ALU 36. The ninth column of Steering Table 2 shows the instruction from column 1 or column 2 
that is to be steered to ALU 37. The tenth column of Steering Table 2 shows the instruction from column 1 or 
column 2 that is to be steered to FPU 35. 

Table 3 below illustrates predecoding bits for seven double words generated by predecode logic 44 as the 
seven double words are fetched from memory 11 and placed in instruction cache 13. 

40 

Table 3 



50 



55 



Even 


Odd 


Predecoded Bits 


Word 


Word 


EFLOP 


AL02 


AL01 


NLE2 


NLE1 


ALDUAL 


alu 


Idst 


0 


1 


0 


0 


0 


1 


br 


alu 


0 


0 


0 


0 


1 


0 


flop 


mms 


1 


0 


1 


0 


0 


1 


Idst 


sys 


0 


0 


0 


1 


0 • 




e_mms 


o_mms 


0 


0 


0 


0 


0 


0 


flop 


br 


1 


0 


1 


0 


0 


1 


sys 


flop 


0 


0 


0 


0 


0 


0 
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For the first double word, predecode bit AL02 is set indicating that the double word aligned two instructions 
can be bundled and the odd word instruction is steered to 37. Also, predecode bit ALDUAL is set indicating 
that the double word aligned two instructions are bundled. 

For the second double word, predecode bit NLE1 is set indicating that the double word non-aligned two 
5 instructions can be bundled and the even word instruction is steered to ALU 36. 

For the third double word, predecode bit EFLOP is set indicating that the even word instruction is a floating 
point operation for an aligned double word. Also, predecode bit AL01 is set, indicating that the double word 
aligned two instructions can be bundled and the odd word instruction is steered to ALU 36. Also, predecode 
bit ALDUAL is set indicating that the double word aligned two instructions are bundled. 
10 For the fourth double word, predecode bit NLE2 is set indicating that the double word non-aligned two in- 

structions can be bundled and the even word instruction is steered to ALU 37. 

For the fifth double word, no predecode bits are set indicating no bundling is possible. 

For the sixth double word, predecode bit EFLOP is set indicating that the even word instruction is a floating 
point operation for an aligned double word. Also, predecode bit AL01 is set. indicating that the double word 
15 aligned two instructions can be bundled and the odd word instruction is steered to ALU 36. Also, predecode 
bit ALDUAL is set indicating that the double word aligned two instructions are bundled. 

For the seventh double word, no predecode bits are set indicating no bundling is possible. 

Table 4 below shows steering for the above seven words during the first eight execution cycles. 



Table 4 



Cycle Dual 





State 


Even 30 


1 


1 


alu 


2 


1 


br 


3 


2 


flop 


4 


2 


Idst 


5 


2 


e_mms 


6 


3 


flop 


7 


2 


flop 


8 


3 


sys 



Instr. Register 



Odd 3 1 Even 32 Odd 33 



Idst 


X 


X 


alu 


alu 


Idst 


mms 


br 


alu 


sys 


flop 


mms 


o mms 


Idst 


sys 


br 


e_mms 


o_mms 


br 


e_mms 


o mms 


flop 


flop 


br 



Execution Unit 



ALU 36 ALU 37 FPL 35 



alu 


Idst 


X 


br 


X 


X 


X 


alu 


flop 


mms 


Idst 


X 


sys 


sys 


X 


e_inms 


X 


X 


o mms 


X 


flop 


br 


X 


X 



The first column of Table 4 shows the cycle. The second column of Table 4 indicates the current state of dual 
state machine 45. A value of "1** indicates dual state machine 45 is in state 101. A value of "2" indicates dual 
state machine 45 is in state 102. A value of "3" indicates dual state machine 45 is in state 103. A value 0^4" 

40 indicates dual state machine 45 is in state 104. The third column indicates the instruction placed in even word 
instruction register 30. The fourth column indicates the instruction placed in odd word instruction register 31. 
The fifth column indicates the Instruction placed in even word instruction register 32. The sixth column indi- 
cates the instruction placed in odd word instruction register 33. The seventh column shows the instruction from 
column 3. column 4. column 5 or column 6 that is to be steered to ALU 36. An "X" value in the seventh column 

45 indicates that it does not matter which instruction is steered to ALU 36. The eighth column shows the instruction 
from column 3, column 4, column 5 or column 6 that is to be steered to ALU 37. The ninth column shows the 
instruction from column 3, column 4. column 5 or column 6 that is to be steered to FPU 35. 

The disclosures in United States patent application no. 08/194,899, from which this application claims pri- 
ority, and in the abstract accompanying this application are incorporated herein by reference. 

50 



Claims 



1. A method of steering instructions to a first integer arithmetic logic unit (36), a second integer arithmetic 

55 \f\n\ct unit (^7\ anH/nr a firtsatinn nninf unit in o r*r\mnttfor ewcfam r>nmr\r!etnn tha ctAnc /%f* 

\~ ' / — O r- — \ — — / ■■• — ■ -r-— — • —7 — ., f... .w.. .g ..... wwf.'w ... 

(a) fetching instructions from a main memory; 

(b) predecoding the instructions to generate predecode bits which indicate whether and how the in- 
structions may be bundled; 
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(c) storing the instructions and the predecode bits in an instruction cache; 

(d) fetching the instructions from the instruction cache with the predecode bits; and 

(e) steering on the basis of the predecode bits each of the instructions to one of the first integer arith- 
metic logic unit, the second integer arithmetic logic unit (37) and the floating point unit (35) for execution. 

A method as in claim 1 , wherein in step (b) the predecode bits indicate whether two consecutive instruc- 
tions may be bundled for execution. 

A method as in claim 2, wherein in step (b) the predecode bits indicate whether two consecutive instruc- 
tions which may be bundled for execution are non-aligned or aligned. 

A method as in any preceding claim, wherein in step (b) the predecode bits identify for each pair of bundled 
instructions to which of the first integer arithmetic logic unit (36), the second integer arithmetic logic unit 
(37) and the floating point unit (35) a particular instruction is to be steered. 

A method as in ny preceding claim, wherein In step (a) the instructions are fetched from the main memory 
two at a time in a double word. 

A computer system comprising: 

a main memory (11); 

an instruction cache (13); and 

a processor (12) coupled to the main memory (11) and the Instruction cache (13), the processor (12) in- 
cluding: 

memory interface means (26), coupled to the main memory (11), for fetching instructions from the 
main memory (11), 

predecoding means (40,41,43,44), coupled to the memory interface means (26), for predecoding 
the instructions to generate predecode bits, the predecode bits indicating whether and how the instruc- 
tions may be bundled, 

interface means (24), coupled to the predecoding means (40,41,43,44) and the instruction cache 
(13), for storing the instructions and the predecode bits in the instruction cache (13) and for fetching the 
instructions from the instruction cache (13) with the predecode bits, 

a first arithmetic logic unit (36), 

a second arithmetic logic unit (37), and 

steering means (34,45), coupled to the interface means (24), the first arithmetic logic unit (36) and 
the second arithmetic logic unit (37), for steering each of the instructions to one of the first integer arith- 
metic logic unit (36) and the second integer arithmetic logic unit (37) for execution, the steering means 
(34,45) utilizing the predecode bits to steer the instructions. 

A computer system as in claim 6, wherein the predecode bits generated by the predecoding means 
(40,41 ,43,44) are able to identify for each pair of bundled instructions to which of the first integer arith- 
metic logic unit (36) and the second integer arithmetic logic unit (37) a particular instruction is to be 
steered. 

A computer system as in claim 6 or 7, wherein the predecoding means (40,41,43,44) includes: 

a first predecode register (40) for holding an even word instruction of an instruction pair curently t>eing 

decoded; 

a second predecode register (41 ) for holding an odd word instruction of the instruction pair currently being 
decoded; and 

a third predecode register (43) for holding an odd word Instruction of an instruction pair previously pre- 
decoded. 

A computer system as in claim 6,7 or 8 wherein the steering means (34,35) includes a state machine (45), 
a current state of the state machine (45) determining which of the predecode bits the steering means 
(34,45) utilizes to steer the instructions. 

A computer system as in any one of claims 6 to 9, wherein the processopr (12) includes a floating point 
unit (35) coupled to the steering means (34.45). the steering means (34.45) being operative to steer float- 
ing point instructions to the floating point unit (35). 
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